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Abstract 

A large number of online services provide automated recommendations to help users to navigate 
through a large collection of items. New items (products, videos, songs, advertisements) are suggested 
on the basis of the user's past history and -when available- her demographic profile. Recommendations 
have to satisfy the dual goal of helping the user to explore the space of available items, while allowing 
the system to probe the user's preferences. 

We model this trade-off using linearly parametrized multi-armed bandits, propose a policy and prove 
upper and lower bounds on the cumulative "reward" that coincide up to constants in the data poor 
(high-dimensional) regime. Prior work on linear bandits has focused on the data rich (low-dimensional) 
regime and used cumulative "risk" as the figure of merit. For this data rich regime, we provide a 
simple modification for our policy that achieves near-optimal risk performance under more restrictive 
assumptions on the geometry of the problem. We test (a variation of) the scheme used for establishing 
achievability on the Netflix and MovieLens datasets and obtain good agreement with the qualitative 
predictions of the theory we develop. 

1 Introduction 

Recommendation systems are a key technology for navigating through the ever-growing amount of data that 
is available on the Internet (products, videos, songs, scientific papers, and so on). Recommended items are 
chosen on the basis of the user's past history and have to strike the right balance between two competing 
objectives: 

Serendipity i.e. allowing accidental pleasant discoveries. This has a positive -albeit hard to quantify- 
impact on user experience, in that it naturally limits the recommendations monotony. It also has a 
quantifiable positive impact on the systems, by providing fresh independent information about the user 
preferences. 

Relevance i.e. determining recommendations which are most valued by the user, given her past choices. 

While this trade-off is well understood by practitioners, as well as in the data mining literature [SPUP02, 
ZH08, SW06 , rigorous and mathematical work has largely focused on the second objective |S J03| ISRJ05 , 
irE0^IUro09llCT101IKMO10allKMO10EllKLTll]. In this paper we ad dress the first objective, building on 
recent work on linearly parametrized bandits T>HK08, RT10, AY PSllj . 

In a simple model, the system recommends items i(l), i(2), i(3), . . . sequentially at times t € {1, 2, 3, . . . }. 
The item index at time t is selected from a large set i(t) E [M] = {1, . . . , M}. Upon viewing (or reading, 
buying, etc.) item i(t), the user provides feedback y t to the system. The feedback can be explicit, e.g. a 
one-to-five-stars rating, or implicit, e.g. the fraction of a video's duration effectively watched by the user. 
We will assume that yt £ R, although more general types of feedback also play an important role in practice, 
and mapping them to real values is sometimes non-trivial. 
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A large body of literature has developed statistical methods to predict the feedback that a user will 
provide on a specific item, given past data concerning the same and other users (see the references above). 
A particularly successful approach uses 'low rank' or 'latent space' models. These models postulate that the 
rating yi^ u provided by user u on item i is approximately given by the scalar product of two feature vectors 
9 U and Xi € W characterizing, respectively, the user and the item. In formulae 

Vi.u — @u) ~l~ Zi.u j 

where (a, b) = X)f=i a i°i denotes the standard scalar product, and captures unexplained factors. The 
resulting matrix of ratings y = is well-approximated by a rank-p matrix. 

The items feature vectors Xi can be either constructed explicitly, or derived from users' feedback using 
matrix factorization methods. Throughout this paper we will assume that they have been computed in 
advance using either one of these methods and are hence given. We will use the shorthand x t — x^ for the 
feature vector of the item recommended at time t. 

Since the items' feature vectors are known in advance, distinct users can be treated independently, and we 
will hereafter focus on a single users, with feature vector 9. The vector 9 can encode demographic information 
known in advance or be computed from the user's feedback. While the model can easily incorporate the 
former, we will focus on the most interesting case in which no information is known in advance. 

We are therefore led to consider the linear bandit model 

y t = (x t ,9) + z t , (1) 

where, for simplicity, we will assume z t ~ IM(0, a 2 ) independent of 9, {xi} \ =1 and {^i}*!* . At each time t, the 
recommender is given to choose a item feature vector xt £ X p C M. p , with X p the set of feature vectors of the 
available items. A recommendation policy is a sequence of random variables {xt\t>ii x t S X p wherein Xt+\ is 
a function of the past history {yt,xi}\<i< t (technically, x t +\ has to be measurable on T t = G{{yi,%l}\=i))- 
The system is rewarded at time t by an amount equal to the user appreciation y t , and we let r t denote the 
expected reward, i.e. r t = E((x t ,9)). 

As mentioned above, the same linear bandit problem was already studied in several papers, most notably 
by Rusmevichientong and Tsitsiklis [RT10 . The theory developed in that work, however, has two limitations 
that are important in the context of recommendation systems. First, the main objective of |RT10j is to 
construct policies with nearly optimal 'regret', and the focus is on the asymptotic behavior for t large with 
p constant. In this limit the regret per unit time goes to 0. In a recommendation system, typical dimensions 
p of the latent feature vector are about 20 to 50 [BK071 IKor081 IKBV09| . If the vector Xi include explicitly 
constructed features, p can easily become easily much larger. As a consequence, existing theory requires at 
least t > 100 ratings, which is unrealistic for many recommendation systems and a large number of users. 

Second, the policies that have been analyzed in |RT10j are based on an alternation of pure exploration and 
pure exploitation. In exploration phases, recommendations are completely independent of the user profile. 
This is somewhat unrealistic (and potentially harmful) in practice because it would translate into a poor 
user experience. Consequently, we postulate the following desirable properties for a "good" policy: 

1. Constant- optimal cumulative reward: For all time t, r ^ ^ s within a constant factor of the 
maximum achievable reward. 

2. Constant- optimal regret: Let the maximum achievable reward be r opt = sup^g^ (x, 9) , then the 'regret' 
^^ =1 (r opt — rt) is within a constant of the optimal. 

3. Approximate monotonicity: For any < t < s, we have ¥{(x s ,9) > cir t } > C2 for ci,C2 as close as 
possible to 1. 

We aim, in this paper, to address the first objection in a fairly general setting. In particular, when t is 
small, say a constant times p, we provide matching upper and lower bounds for the cumulative reward under 
certain mild assumptions on the set of arms X p . Under more restrictive assumptions on the set of arms X p , 
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our policy can be extended to achieve near optimal regret as well. Although we will not prove a formal result 
of the type of Point 3, our policy is an excellent candidate in that respect. 

The paper is organized as follows : in Section[2]we formally state our main results. In Section[3]we discuss 
further related work. Some explication on the assumptions we make on the set of arms X p is provided in 
Section [4] In Section [5] we present numerical simulations of our policy on synthetic as well as realistic data 
from the Netflix and MovieLens datasets. We also compare our results with prior work, and in particular 
with the policy of [RT10 . Finally, proofs are given in Sections [6] and [7J 

2 Main results 

We denote by Ball(x; p) the Euclidean ball in K p with radius p and center x G M. p . If x is the origin, we omit 
this argument and write Ball(p). Also, we denote the identity matrix as I p . 

Our achievability results are based on the following assumption on the set of arms X p . 

Assumption 1. Assume, without loss of generality, X p G Ball (1) . We further assume that there exists a 
subset of arms X' C X p such that: 

1. For each x G X' there exists a distribution ¥ x (z) supported on X p withM x (z) — x andK x (zz T ) y (7/p)Ip> 
for a constant 7 > 0. Here E^-) denotes expectation with respect to P x . 

2. For all 9 G W, sup^^, (x, 9) > «||0||2 for some k > 0. 

Examples of sets satisfying Assumption [T] and further discussion of its geometrical meaning are deferred 
to Section |4j Intuitively, it requires that X p is 'well spread-out' in the unit ball Ball(l). 

Following [RTIO] we will also assume 9 G K p to be drawn from a Gaussian prior N(0, l p /p). This roughly 
corresponds to the assumption that nothing is known a priori about the user except the length of its feature 
vector « 1. Under this assumption, the scalar product (xi,0), where X\ is necessarily independent of 9, 
is also Gaussian with mean and variance 1/p and hence A = pa 2 is noise-to-signal ratio for the problem. 
Our results are explicitly computable and apply to any value of A. However they are constant-optimal for 
A bounded away from zero. 

Let 9t be the posterior mean estimate of 9 at time t, namely 

A greedy policy would select the arm x G X p that maximizes the expected one-step reward (x, 0*). As for 
the classical multiarmed bandit problem, we would like to combine this approach with random exploration 
of alternative arms. We will refer to our strategy as SmoothExplore since it combines exploration and 
exploitation in a continuous manner. This policy is summarized in Table [l] 

Algorithm 1 SmoothExplore 

1: initialize I = 1, 9 X = 0, §i/\\6i\\ = e u Si = l p /p. 

2: repeat 

3: Compute: xg. = argmax x g^' (6t, x). 

4: Play: x t ~ P £f (•), observe y t = (x t , 9) + z t . 

5: Update: I <- £ + 1, 9 t = argmin eeRP ^ Eti(W ~ ( x ^ 9 )) 2 + i^\\ d \\ 2 - 

6: until t > t 



The policy SmoothExplore uses a fixed mixture of exploration and exploitation as prescribed by the 
probability kernel ( ■ ) . As formalized below, this is constant optimal in the data poor high-dimensional 
regime hence on small time horizons. 
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While the focus of this paper is on the data poor regime, it is useful to discuss how the latter blends with 
the data rich regime that arises on long time horizons. This also clarifies where the boundary between short 
and long time horizons sits. Of course, one possibility would be to switch to a long-time-horizon policy such 
as the one of RTlO]. Alternatively, in the spirit of approximate monotonicity, we can try to progressively 
reduce the random exploration component as t increases. We will illustrate this point for the special case 
X p = Ball (1) . In that case, we introduce a special case of SmoothExplore , called BallExplore , cf. 
Table [2j The amount of random exploration at time t is gauged by a parameter /3 t that decreases from 
fh = 9(1) to (3 t -)• as t-> oo. 

Note that, for t < pA, f3 t is kept constant with /3 t = -\/2/3. In this regime BallExplore corresponds 
to SmoothExplore with the choice X p = <9Ball(l/v / 3) (here and below dS denotes the boundary of a set 
S). It is not hard to check that this choice of X' p satisfies Assumption [I] with k = l/VS and 7 = 2/3. For 
further discussion on this point, we refer the reader to Section [4j 

Algorithm 2 BallExplore 

1: initialize I = 1, Q x = 0, 0i/\\9i\\ = e u Si = I p /p, = I p — e x e\. 
2: repeat 

3: Compute: x t = axgmax x6Ba ||( 1 )(^,x) = 9e/\\9e\\, (3 e = y/2/3mm(j>A/l, 1) 1/4 . 

4: Play: Xt — \A — fif^t + PiP~t u i, where ui is a uniformly sampled unit vector, independent of the 
past. 

5: Observe: yt = (x t ,6) + z t . 

6: Update: i<-l + l,§ t = argmm 6eMP EtliVi ~ (^,0)) 2 + ±\\0\\\ ?f = I p - ^J/||^|| 2 . 
7: until £ > t 



Our main result characterizes the cumulative reward 

t t 



Theorem 1. Consider the linear bandits problem with 9 ~ N(0, l pxp /p), xt G X p C Ball (1) satisfying 
Assumption^ and pa 2 — A. Further assume that p > 2 and pA > 2. 

Then there exists a constant C\ — Ci(k,7, A) bounded for k,7 and A bounded away from zero, such that 
SmoothExplore achieves, for 1 < t < pA, cumulative reward 

R t > dt 3/2 p- 1/2 . 

Further, the cumulative reward of any strategy is bounded for 1 < t < pA as: 

Rt < C 2 t 3/2 p- 1/2 . 
We may take the constants Ci(k,7, A) and C2{A) to be: 

«VAC( 7 ,A) „ 2 



Ci 



where C(7,A) 



24a( 7 , A) 

7 



4(A + 1) 



C 2 

3VA' 

a( 7 ,A) = 1 



3 log 



'Hi 



AC( 7 ,A) 



1/2 



In the special case where X p = Ba 11(1), we have the following result demonstrating that BallExplore 
has near-optimal performance in the long time horizon as well. 

Theorem 2. Consider the linear bandits problem with 9 ~ N(0,I pxp /p) with the set of arms X p is the unit 
ball, i.e. Ball(l). Assume, p > 2 and pA > 2. Then BallExplore achieves for all t > pA: 



Rt>r opt t-C 3 (pt) 



1/2-Mp) 
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where: 

w(p) = l/(2(p + 2)), C 3 (A) = 70 

For t > pA, we can obtain a matching upper bound by a simple modification of the arguments in |RT10 . 

Theorem 3 (Rusmevichientong and Tsitsiklis). Under the described model, the cumulative reward of any 
policy is bounded as follows 

for t > pA, R t < r opt t - y/p~tA + ^ . 

The above results characterize a sharp dichotomy between a low-dimensional, data rich regime for t > pA 
and a high-dimensional, data poor regime for t < pA. In the first case classical theory applies: the reward 
approaches the oracle performance with a gap of order \fp~i. This behavior is in turn closely related to central 
limit theorem scaling in asymptotic statistics. Notice that the scaling with t of our upper bound on the risk 
of BallExplore for large t is suboptimal, namely (pi) 1 ' 2+a, ( p ). Since however w{p) = <d(l/p) the difference 
can be seen only on exponential time scales t > exp{0(p)} and is likely to be irrelevant for moderate to 
large values p (see Section [5] for a demonstration) . It is an open problem to establish the exact asymptotic 
scaling of BallExplore . 

In the high-dimensional, data poor regime t < pA, the number of observations is smaller than the number 
of model parameters and the vector 9 can only be partially estimated. Nevertheless, such partial estimate 
can be exploited to produce a cumulative reward scaling as t 3 / 2 p -1 / 2 . In this regime performances are not 
limited by central limit theorem fluctuations in the estimate of 9. The limiting factor is instead the dimension 
of the parameter space that can be effectively explored in t steps. 

In order to understand this behavior, it is convenient to consider the noiseless case a = 0. This is a 
somewhat degenerate case that, although not covered by the above theorem, yields useful intuition. In 
the noiseless case, acquiring t observations y%, ...yt is equivalent to learning the projection of 9 on the 
^-dimensional subspace spanned by xi, . . . ,xt- Equivalently, we learn t coordinates of 9 in a suitable basis. 
Since the mean square value of each component of 9 is 1/p, this yields an estimate of 9t (the restriction to 
these coordinates) with 1E|| 6> t 1 1| = t/p. By selecting x* in the direction of 9t we achieve instantaneous reward 
r t ~ y/t/p and hence cumulative reward Rt = 0(i 3 / 2 p -1 / 2 ) as stated in the theorem. 

3 Related work 

Auer in |Aue02j first considered a model similar to ours, wherein the parameter 9 and noise Zt are bounded 
almost surely. This work assumes X p finite and introduces an algorithm based on upper confidence bounds. 
Dani et al. [DHK08] extended the policy of [Aue02] for arbitrary compact decision sets X p . For finite sets, 

DHK08J prove an upper bound on the regret that is logarithmic in its cardinality \X p \, while for continuous 
sets they prove an upper bound of 0(^/pi log 3 ^ 2 t). This result was further improved by logarithmic factors in 

AYPS11J. The common theme throughout this line of work is the use of upper confidence bounds and least- 
squares estimation. The algorithms typically construct ellipsoidal confidence sets around the least-squares 
estimate 9 which, with high probability, contain the parameter 9. The algorithm then chooses optimistically 
the arm that appears the best with respect to this ellipsoid. As the confidence ellipsoids are initialized to be 
large, the bounds are only useful for t 3> p. In particular, in the high-dimensional data-poor regime t — 0(p), 
the bounds typically become trivial. In light of Theorem [3] this is not surprising. Even after normalizing 
the noise-to-signal ratio while scaling the dimension, the 0(y/pt) dependence of the risk is relevant only for 
large time scales of t > pA. This is the regime in which the parameter 9 has been estimated fairly well. 

Rusmevichientong and Tsitsiklis [RTlOj propose a phased policy which operates in distinct phases of 
learning the parameter 9 and earning based on the current estimate of 9. Although this approach yields 

Simulations suggest that the upper bound (pt)V a + u (j>) might be tight. 



5 



order optimal bounds for the regret, it suffers from the same shortcomings as confidence-ellipsoid based 
algorithms. In fact, [RT10 also consider a more general policy based on confidence bounds and prove a 
0(^/pi\og 3 ^ 2 t) bound on the regret. 

Our approach to the problem is significantly different and does not rely on confidence bounds. It would 
be interesting to understand whether the techniques developed here can be use to improve the confidence 
bounds method. 

4 On Assumption [I] 

The geometry of the set of arms X p is an important factor in the in the performance of any policy. For 
instance, [RTIO , DHK08J and jAYPSll ] provide "problem-dependent" bounds on the regret incurred in 
terms of the difference between the reward of the optimal arm and the next-optimal arm. This characteri- 
zation is reasonable in the long time horizon: if the posterior estimate 9 t of the feature vector 9 coincided 
with 9 itself, only the optimal arm would matter. Since the posterior estimate converges to 9 in the limit of 
large t, the local geometry of X p around the optimal arm dictates the asymptotic behavior of the regret. 

In the high-dimensional, short-time regime, the global geometry of X p plays instead a crucial role. This 
is quantified in our results through the parameters k and 7 appearing in Assumption [TJ Roughly speaking, 
this amounts to requiring that X p is 'spread out' in the unit ball. It is useful to discuss this intuition in a 
more precise manner. For the proofs of statements in this section we refer to Appendix |A"| 

A simple case is the one in which the arm set contains a ball. 

Lemma 4.1. // Bail (p) C^ p C Ball(l), then X p satisfies Assumption^^ with n — p/VS, 7 = 2p 2 /3. 

The last lemma does not cover the interesting case in which X p is finite. The next result shows however 
that, for Assumption [TJ 2 to hold it is sufficient that the closure of the convex hull of X p , denoted by conv(X p ), 
contains a ball. 

Proposition 4.2. Assumption [TJi? holds if and only if Ball (ac) C conv(A'p). 

In other words, Assumption [TJ 2 is satisfied if X p is 'spread out' in all directions around the origin. 
Finally, we consider a concrete example with X p finite. Let x%, X2, ■ ■ • , xm to be i.i.d. uniformly random 
in Ball ( 1) . We then refer to the set of arms X„ = {x\, X2, ■ ■ ■ , 111} as to a uniform cloud. 

Proposition 4.3. A uniform cloud X p in dimension p > 20 satisfies Assumption^ with M — 8 P , k = 1/4 
and 7 = 1/32 with probability larger than 1 — 2exp(— p). 

5 Numerical results 

We will mainly compare our results with those of |RT10j since the results of that paper directly apply to 
the present problem. The authors proposed a phased exploration/exploitation policy, wherein they separate 
the phases of learning the parameter 9 (exploration) and earning reward based on the current estimate of 9 
(exploitation). 

In Figure [TJ we plot the cumulative reward and the cumulative risk incurred by our policy and the 
phased policy, as well as analytical bounds thereof. We generated 9 ~ N(0,I p ) randomly for p — 30, and 
produced observations y t , t g {I, 2, 3, ... } according to the general model (nj with A = pa 2 = 1 and arm set 
X p = Ba 11(1). The curves presented here are averages over n = 5000 realizations and statistical fluctuations 
are negligible. 

The left frame illustrates the performance of SmoothExplore in the data poor (high-dimensional) 
regime t < 2pA. We compare the cumulative reward Rt as achieved in simulations, with that of the phased 
policy of [RTlD] and with the theoretical upper bound of Theorem [TJ (and Theorem [3] for t > pA). In the 
right frame we consider instead the data rich (low-dimensional) regime t 3> pA. In this case it is more 
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Figure 1: Left frame: Cumulative reward Rt in the data poor regime t < 2pA (here p = 30, A = 1) as 
obtained through numerical simulations over synthetic data, together with analytical upper bound. Right 
frame: Cumulative risk in the data rich regime t 3> pA (again, p = 30, A = 1). 



convenient to plot the cumulative risk tr opt — R t . We plot the curves corresponding to the ones in the left 
frame, as well as the upper bound (lower bound on the reward) from Theorems [l] and [2j 

Note that the 0(y/pt) behavior of the risk of the phased policy can be observed only for t > 1000. On the 
other hand, our policy displays the correct behavior for both time scales. The extra u>(jp) = 0(l/p) factor 
in the exponent yields a multiplicative factor larger than 2 only for t > 2 2 ( p+2 ) s» 2 • 10 19 . 

The above set of numerical experiments used X p = Ball (1 ) . For applications to recommendation systems, 
X p is in correspondence with a certain catalogue of achievable products or contents. In particular, X p is 
expected to be finite. It is therefore important to check how does SmoothExplore perform for a realistic 
sets of arms. We plot results obtained with the Netflix Prize datasct and the MovieLens 1M dataset in 
Figure [2] Here the feature vectors Xi 's for movies are obtained using the matrix completion algorithm of 
KMOlOb . The user parameter vectors 9 U were obtained by regressing the rating against the movie feature 
vectors (the average user rating a u was subtracted). Similar to synthetic data, we took p = 30. Regression 
also yields an estimate for the noise variance which is assumed known in the algorithm. We then simulated 
an interactive scenario by postulating that the rating of user u for movie i is given by 

y itU = Quant(a n + (x9 u )) , 

where Quant(z) quantizes z to to {1,2, • • • ,5} (corresponding to a one-to-five star rating). The feedback 
used for our simulation is the centered rating yi M — yi_ u — a u . 

We implement a slightly modified version of SmoothExplore for these simulations. At each time we 
compute the ridge regression estimate of the user feature vector Q t as before and choose the "best" movie 
x t — argmax a;6 ^ p (a;, Qt) assuming our estimate is error free. We then construct the ball in W with center 
x t and radius f3 t . We list all the movies whose feature vectors fall in this ball, and recommend a uniformly 
randomly chosen one in this list. 

Classical bandit theory implies the reward behavior is of the type c\t — C2\fi where c\ and C2 are 
(dimension-dependent) constants. Figure [2] presents the best fit of this type for t < 2p. The descrip- 
tion appears to be qualitatively incorrect in this regime. Indeed, in this regime, the reward behavior is 
better explained by a c 3 i 3 / 2 curve. These results suggest that our policy is fairly robust to the significant 
modeling uncertainty inherent in the problem. In particular, the fact that the "noise" encountered in practice 
is manifestly non-Gaussian does not affect the qualitative predictions of Theorem [T] 

A full validation of our approach would require an actual interactive realization of a recommendation 
system [DM13J. Unfortunately, such validation cannot be provided by existing datasets, such as the ones 
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Figure 2: Results using the Netflix (left frame) and MovieLens 1M (right frame) datasets. SmoothExplore 
is effective in learning the user's preferences and is well described by the predicted behavior of Theorem [T] 

used here. A naive approach would be to use the actual ratings as the feedback y iui but this suffers from 
many shortcomings. First of all, each user rates a sparse subset (of the order of 100 movies) of the whole 
database of movies, and hence any policy to be tested would be heavily constrained and distorted. Second, 
the set of rated movies is a biased subset (since it is selected by the user itself) . 

6 Proof of Theorem [I] 

We begin with some useful notation. Define the er-algebra Tt = a{{yi, Also let Q t = &({ye}i=i, { x iYt=i)- 

We let 9 t and S t denote the posterior mean and covariance of 9 given t — 1 observations. Since 9 is Gaussian 
and the observations are linear, it is a standard result that these can be computed as: 

S t = Cov(0|.Ft_i) = (pip 



1=1 ) 
= E(0| Tt-x) = St (j2^2 x ^j ■ 



Note that since 9 is Gaussian and the measurements are linear the posterior mean coincides with the maxi- 
mum likelihood estimate for 9. This ensures our notation is consistent. 

6.1 Upper bound on reward 

A 1 /2 

At time £, the expected reward rg — E((x e ,9)) < E(\\9 e \\) < [E(||^|| 2 )] 1 , where the first inequality follows 
from Cauchy-Schwarz, that 9 t is unbiased and that ||a^|| < 1. Since 1 = E(||6»|| 2 ) = E(||^|| 2 ) + E(TrS^): 

r 2 <l-E(Tr(£,)). (3) 
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We have, applying Jensen's inequality and further simplification: 

ETr(E^) > pVEOXE^ 1 )) 



p 2 /ETr pi 



> 1 



t-1 

p 2 er 2 



Using this to bound the right hand side of Eq. ([3| 



ri < 1 - 



1 



l + (£-l)/(pa) 2 



(£- l)/p+p(T 2 
1 *-l 



< 

P<7 Z p 

The cumulative reward can then be bounded as follows 

t 



l-l 



< 



3 ypc 2 
= C 2 (A)i 3 /V 1/2 



Here we define C 2 (A) = 2/3VA. 



6.2 Lower bound on reward 

We compute the expected reward earned by SmoothExplore at time t as: 

r t =E((x t ,6)) 

= E(E({ Xt ,e)\g t - 1 )) 

= E(E«z t ,0t>|&_i)) 
= E«jc t ,&» 
>/sE(||^||). 

The following lemma guarantees that 1 1 1 1 is fl(y/t). 
Lemma 6.1. Under the conditions of Theorem^we have, for all t > 1: 

E||0 t || >C'( 7 ,A)t 1 / 2 p- 1 / 2 . 

Here: 

1C( 7 ,A) /A 



where C (7, A) 



2 a( 7 , A) V 8 



7 



4(A + 1) 



a( 7 ,A) = 1 + 



3 log 



/ 96 



VAC( 7 , A) 



1/2 



(4) 
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Using this lemma we proceed by bounding the right side of Eq. Q : 



r t > kC 



t-1 



Computing cumulative reward Rt we have: 



> kC 



> | K c'(;-i) 3 /V 1/2 

> ~t 3 /V 1/2 - 



3V2 

Thus, letting Ci(k, 7 , A) = kC" (7, A)/3-\/2, we have the required result. 
6.3 Proof of Lemma I6.ll 

In order to prove that E(||0 t ||) = Cl(y/t), we will first show that E(||# f || 2 ) is Q(t). Then we prove that \\9 t \ 
is sub-gaussian, and use this to arrive at the required result. 

Lemma 6.2 (Growth of Second Moment). Under the conditions of Theorem^ 

E||<y 2 >C(A, 7 ) — , 



where 



C(A, 7 ) 



4(A + 1)' 

Proof. We rewrite 6t using the following inductive form: 

_i = 6 t + S f+ i f — x t xj ) v t + E t+ i^x t . 



7 *+i 



(5) 



Here ft = 9 — t is a random zero mean vector. Conditional on J-t-i, Vt is distributed as N(0, E t ) and is 
independent of x t and z t - Recall that the <7-algebra Gt — cr {{ye} t p=i> { x eYe=i) 2 ^t—i- Then we have: 



E(||0 t+ i|| 2 |&) 



?tlr + -rE 

(7 



v t CE t+1 x t x t ) {E t+1 x t x t ) v t \Gt 



r E[ 2 2 |£ t ] (£ t+1 x t ) T (E t+1 z t ) . (6) 



The cross terms cancel since U( and z% conditionally on Q t are independent and zero mean. The expectation 
in the second term can be reduced as follows: 



E 



vj (Zt+ixtx]) 7 (E t+1 x t xJ) v t \Q t 



Tr 



(Y, t+1 x t xl) E t (S t+ ia; t a;J) 

= (xjE t x t )Tr [E i+1 2: t a;j£ t+ i] 
= (xjE t a; t ) (xj^ +1 x t ) . 
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The third term can be seen to be: 

E [z 2 \G t ] (E t+ ix t ) T (E i+l x t ) = <j 2 x]~£, 2 +1 x t . 
Thus we have, continuing Eq. ([6]): 

E(||0 t+1 || 2 |<? t ) = \\§ t || 2 + ± (a 2 + xjEtXt) (xj^ 2 +1 x t ) . 
Since J^t+i = (E^ 1 + ^ x txJ) = St — 'EtXtxJ'Et/{& 2 + xjTi t Xt), some calculation yields that: 

a 4 (a^t) 



(7) 



xjY, 2 +1 Xt 



(a 2 + xjT, t x t y 



Thus Eq. ([7| reduces to 



E(||0 t+ i|| 2 |&) = 11**1 



(T 2 + xjT, t x t ' 



(8) 



We now bound the additive term in Eq. We know that E t ^ I/p (the prior covariance), thus xjT,fXt < 1/p 
since Xt € X p C- Ball (1). Hence the denominator in Eq. ([8]) is upper bounded by a 2 + l/p. To bound the 
numerator: 

E^EjiBtl^t-i] = E[Tr(S 2 a; t x 1 ")| J- t _i] 
= Tr[S 2 E(a: ta; 1 "|J- t _ 1 )] 

since E,^ t (x t xJ) >z {l/pfi-p by Assumption [TJ Using this in Eq. ([8|, we take expectations to get: 



E(||4+i|| 2 )>E(||^ 2 ) + ^E[Tr(£ 2 )]. 

Considering the second term in Eq. (|9| : 

E[Tr(E 2 )] > pE^ 2 ) 1 /*] 



(9) 



pE 



2/p 



where A-,- is the j th eigenvalue of X^=i %e%J ■ Continuing the chain of inequalities: 



E[Tr(E 2 )] > - E 



> -E 
P 



■E 



n 1 

p 

n° x p 



2L 
A 



-2/p 



2A, 
pA 



6XP | p^A Jr (i XexTe 



> - exp 
P 



2ft -1) 
pA 



11 



where the last inequality follows from the fact that xi € Ball(l) for each t. Combining this with Eq. ^ 
gives: 



E( ||tf t+1 || 2 > E(\\e t f) + cxp {- 2 -^ } • 1 10) 



Summing over t this implies 

nPt\\ 2 } > 



- ^ 7 1 ~ exp{2(£ - l)/pA} 
p(A + 1) 1 - exp{-2/pA} 



- 2(A + l) (1 ~ eXp{ ~ 2( *~ 1)/pA}) 
> o.A 7 , ^ (1 - cx P {-2(pA - 1)/ P A}) ^ " ' 



2(A + 1) v ^ J; V J> 

The last inequality follows from fact that 1 — cxp(— z) is concave in z. Using pA > 2, we obtain: 



e we, n > 



> 



2^ ^ 7(1 -e~ l )t-l 



2(A + 1) p 
7 t- 1 



4(A + 1) p 

□ 

Lemma 6.3 (Sub-Gaussianity of ||#t||)- Under the conditions of Theorem^ 



p ( ||<y ^ v - e ^' 1)2/3 - 

Proof. Note that t is a (vector-valued) martingale. The associated difference sequence given by (cf. Eq. ([5 

(v t ,x t ) +z t 

Si = 5 ^t+ix t . 

o~ 

Note that t = We have that E(£ t |.F t _i) = 0. Then conditionally on Q t , ||£ t || = |w t | pTTaH] » wnere 

w t = ( Vt > x j) +Zt ||S t+ iXt| is Gaussian with variance given by: 

Vax(w t \Gt) = + X \ ^ tXt xjYil +1 x t 
o~ 

x\Y? t x t 



CT 2 + xjY^tXt 
1 

" pA' 



since 0^ St ^I/p and ||xt|| < 1. Thus, we have the following "light-tail" condition on £t 



2A\ 



-1/2 



Using A = pA/4, we obtain: 

E( e f A Hf'H 2 /4|g t ) < y/2 < e . 
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Now using Theorem 2.1 in JN08] we obtain that: 



which implies the lemma. 

We can now prove Lemma |6.1| We have: 



□ 



E[||0 t || 2 ]=E[||0 t 



*ll R \\e t p>a\ 



<V^n\\9t\\}+ n\\9t\\ 2 > y)dy. 



Here we use the fact that \\9t\\ is a positive random variable. Employing Lemma 
term: 



6.3 



(11) 

to bound the second 



\O t \\ 2 >y)dy< 



pA 

8(t~l) 
pA 



2(z,-l)e-^- 1 ) 2 / 3 d^ + 2 / e-^ 1 ) 2 /^ 



< 



8 0-l) 3 « c -(q-l) 2 /3 

pA a — 1 



where we define a ~ y/apA/8(t — 1). Using this and the result of Lemma 6.2 in Eq. ( 11 ) 

It -I 



(a-l)VA 



P 



> 



C( 7 ,A) /A 6 



-(a-l) 2 /3 



i- 1 



where the last inequality holds when a > 2. Using a(7, A) = 1 + [31og(96/AC(7, A))] 1 / 2 > 2, the second 
term in leading constant is half that of the first, and we get the desired result. 



7 Proof of Theorem [2] 

We now consider the large time horizon of t > pA for strategy BallExplore , assuming the special case 
X p = Ball (1) . Throughout, we will adopt the notation /3 2 = 1 — /3 2 . To begin, we bound the mean squared 
error in estimating 9 using the following 

Lemma 7.1 (Upper bound on Squared Error). Under the conditions of Theorem^ we have Vi > pA + 1: 



E(Tr(E 4 )) < C 4 (A)y|, 

where C 4 (A) = 3(A + 1)/S/A. 

Proof. As E t = (^Y-i + ~^2 x txJ) _1 , we use the inversion lemma to get: 



Tr(E t ) =Tr(E t _i) 
< Tr(S*_i) 



(7 2 + X^St-lXt 
P T V 2 



(A + l) 
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where the inequality follows from Ej_i z^Ip/f> and ||a;t|| 2 < 1 for each I. Using xt = fit^f^ + fitPt~Ut and 
taking expectations on either side, we obtain: 



E(Tr(E t )) < E(Tr(S(_i)) 



< E(Tr(E t _ x )) 



A + l 



A + l 



where we used /3 2 — 0%/p > 0. This follows because /3 2 < 2/3 < p/(p + 1) when < > pA and p > 2. Employing 
Cauchy-Schwartz twice and using substituting for (i\ we get the following recursion in E(Tr(E t )): 



(12) 



E(Tr(E t )) < E(Tr(S t _ x )) - -^^-L^Tr^))] 2 . 
The function f{z) = z — z 2 jb is increasing z when z € (0, 6/2). For the recursion above: 

6 = 6(t) = ^y|(A + l) 

> P (A + l) 

>4, 

since pA > 2 and p > 2. Also, we know that S t ^I p /p and hence Tr(E t ) < 1 with probability 1 and 
that E(Tr(E t )) is decreasing in t. Thus the right hand side of the recursion is increasing in its argument. 
A standard induction argument then implies that E(Tr(E t )) is bounded pointwise by the solution to the 
following equation: 

y(t) = y(t Q ) - c f ^ds, 



with the initial condition to = pA, y(to) = 1, where c = 2vA/3(A + l)y/p. The solution is explicitly 
computed to yield: 



E(Tr(E t )) < 



where d = c^fp = 2\/A/3(A + 1). Since the constant term is always positive, we can remove it and obtain 
the required result. 

□ 

We can now prove the following result: 
Lemma 7.2. For all t > pA, under the conditions of Theorem^ 



E 



e /p\l/2-l/2(p+2) 



Proof. Using the linearity of expectation: 



E 


T ( 9 §t V 


< E 


9 T 











l(\\6\\<e) 



E 



(13) 
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We bound the first term as follows: 



)l(ll0ll< £ ) 


< E 


11*11 


9 




i(\\e 


\\<e) 








\\e\\ 


\0t\\ 







E 



< 2eP(||0|| < e) 

< 2e p+1 e^ 2 . 

The first inequality is Cauchy- Schwartz, the second follows from bounds on the norm of either vectors while 
the third is a standard Chernoff bound computation using the fact that 9 ~ N(0,I p /p). The second term 
can be bounded as follows: 



E 



\»t\ 



< E 



2\\0-9A\ 2 



K\\9\\>e) 



< -E(\\9 - 9 t \\ 2 ) 

< ?E(Tr£ t ). 



The first inequality follows from Lemmas 3.5 and 3.6 of [RTlOj . the second follows from the fact that \\9 — 9 t 
is nonnegative and the indicator is used. Combining the bounds above and Lemma |7.1| we get: 



E 






< 2e P + i e P/ 2 + 2C, 4(A) 






\\0t\\)_ 


£ 



Optimizing over e we obtain: 



E 



l-l/(p+2) 



< 



4e 1 /2 C ' 4 (A) (f) 



p Nl/2-l/2(p+2) 



□ 



We using Lemma |7.2| we can now prove Theorem[2]for the large time horizon. Let p t denote the expected 
regret incurred by SmoothExplore at time t > pA. By definition, we write it as: 



Pt =E 

= E 



PtPtut 



V 11*11 Pt] 

as Ut is zero mean conditioned on past observations. We split the first term in two components to get: 

Pt < (l-ft)E||0|| +&E 



We know that < 1 - /3 t < /3 2 = ^/ApK/9t. We use this and the result of Lemma 7.2 to bound the right 
hand side above as: 

2 fpA\ 1/2 , A s /T/j?\V2-w(p) 
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where we define uj(p) = 1/ (2(p + 2)). Summing over the relevant interval and bounding by the corresponding 
integrals, we obtain: 



£ Pl < ^(Pt) 1/2 + 24(A + 1) J|(^) 1/2+ " (p) 

£=pA+l " 

where C 3 (A) = 4n/A/3 + 24(A + l)^/e/sfK and w(p) = l/2(p + 2). We can use C* 3 (A) ee 70(A + 1)/VA for 
simplicity. 
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A Properties of the set of arms 
A.l Proof of Lemma 14.11 

We let X' p = SBal^p / 'y3) , where dS denotes the boundary of a set S. For each x G X p denote the projection 
orthogonal to it by . We use the distribution V x {z) induced by: 

2 



Z = X+ V 3 pP ^ M ' 

where u is chosen uniformly at random on the unit sphere. This distribution is in fact supported on 
Ba 1 1 (/£>) C X p . Also, we have, for all x G X' E x (z) = x. Computing the second moment: 

„2 



E x (zz T ) = E (xx T + ^-P^uu T P^ 



xx- + -^—Pt 



T ""3p 

2 

XX ' 

Vj 



p P+ (l- 2 )xxT 



3p 

where in the first equality we used linearity of expectation, and that the projection mapping is idempotent. 
This yields 7 = 2p 2 /3. Since X' v = 9Ball( J o/v / 3) we obtain k — inf j-gt. y gty sup^ xeX ,y (9, x) = pj\f%. Thus this 
construction satisfies Assumption[T] Note the fact that BallExplore is a special case of SmoothExplore 
follows from the fact that we can use p = 1 above when X p = Ball(l). 



A. 2 Proof of Proposition 4.2 



Throughout we will denote by conv(S') the convex hull of set S, and by conv(S') its closure. Also, it is 
sufficient to consider Assumption [I] 2 for \\9\\ = 1. 

It is immediate to see that Ball(ft) C conv(X p ) implies Assumption [l]2. Indeed 

sup{(0, a;): x G X' p } = sup {(9, x) : 1 6 conv(^)} 
= max {(#, it) : x G conv(Ap} 
> max{(6»,a;) : x G Ball(/t)} > k\\9\\ , 
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where the last inequality follows by taking x = k6/\\6\\ 
In order to prove the converse, let 



sup jp : Ball(p) G conv(Ap j . 



We then have Ball(Ko) Q coSv(X'). Assume by contradiction that kq < n. Then there exists at least one 
point xq on the boundary of canv(Xp) such that ||xo|| = «o (else kq would not be the supremum). 

By the supporting hyperplane theorem, there exists a closed half space W in R p such that conv(A'p) C % 
and xq is on the boundary dW of W. It follows that Ball(«o) C Ti has well, and therefore dW is tangent to 
the ball at Xq. Summarizing 

conv(A^) C H = ix G W : (x,x ) < K \\x \\ } • (14) 

By taking 9 — xo/||xo||, we then have, for any x G co5v(Xp), (9, x) < kq < k, which is in contradiction with 
Assumption [TJ2. 



A. 3 Proof of Proposition |4.3 
A. 3.1 Proof of condition 1 

Choose Xp — X p n Ball(p). We first prove that f(0) = max l6 ^' (9, x) is Lipschitz continuous with constant 
p. Then, employing an u-net argument, we prove that this choice of X' v satisfies Assumption [TJ 1 with high 
probability. 

Let f(9i) = (9i,Xi) for i = 1,2. Without loss of generality, assume f{9\) > f{6z). We then have: 

\f(ei)-f(82)\ = \(9i,x 1 )-(9 2 ,x 2 )\ 

= \{9 1 ,x 1 )-(9 2 ,x 1 ) + {9 2 ,x 1 )-(9 2 ,x 2 )\ 

< \{0i-9 2 ,xi)\ 

< \\xi\W\9x - 6 2 \\ 
<p\\0i-d 2 \\, 

where the first inequality follows since x 2 maximizes (92,x 2 ), the second is Cauchy-Schwarz and the third 
from the fact that X\ G X p D Ball (p) . 

Since f(9) = it suffices to consider 9 on the unit sphere S p . Suppose T is an u-net of the 

unit sphere, i.e. a maximal set of points that are separated from each other by at least v. We can bound 
|T| by a volume packing argument: consider balls of radius v/2 around every point in T. Each of these is 
disjoint (by the property of an v-net) and, by the triangle inequality, are all contained in a ball of radius 
1 + v/2. The latter has a volume of (1 + 2v~ 1 ) p times that of each of the smaller balls, thus yielding that 
|T| < {1 + 2v- 1 )p. 

Now, \Xp\ is binomial with mean Mp p and variance Mp p (l — p p ). Consider a single point 9 E T. Due 
to rotational invariance we may assume 9 — ei, the first canonical basis vector. Conditional on the event 
E n = {lo : \Xp\ = n}, the arms in Xp are uniformly distributed in Ball(p). Thus we have (assuming z > 0): 



P(max(a?,ei) < z P \E n ) = TT F({ Xj , e x ) < z P \E n 



= (P((a;i,ei) < zp\x\ G Ball(p))" (15) 
= (¥((x u e 1 ) <z) n (16) 

(17) 

since the (xj, ei), j G {1, . . . , n} are iid, and the conditional distribution of X\ given x G Ball(p) is the same as 
the unconditional distribution of px. Let Y\ ■ ■ ■ Y p ~ N(0, 1/2) be iid and Z ~ Exp(l) be independent of the 
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Yi. Then by Theorem 1 of |BGMN05| (x i; ei) is distributed as pYi/(Ei=i >?+Z) 1/2 . By a standard Chernoff 
argument, P(E?=2 Y i ^ 2 fa ~ *)) < exp{-c(p - 1)} where c = (log 2-1) /2. Also, P(Z > p) = exp(-p) and 
P^i 2 > p) < 2exp(— p). This allows us the following bound: 



< v{p) + (1 - i/(p))P [ 1 < z 



where = 3exp(— p) + exp(— c(p — 1)). We further simplify to obtain 



*i<J> 



(xi, ei) <«)<!-(!- i/(p))P ( yfi= > z 



Y?<p 



< 1 - (1 - Hp)) (MV^p) - F G {z^p~A)\ , 



and F G {) denotes the Gaussian cumulative distribution function. Employing this in Eq. \17\ 
P(max(x,ei) < zp\E n ) < [l - (1 - v{p)) (F G {^2p) - F G (z^/8p-A) 

x^X' p L \ 

< exp [-ti(1 - u{p)){F G {^p) - F G {z^8p-A) 

For p > 6, we have that 1 - v{p) > 1/2 and F G {y/2p) - F G (y/8p- 4) > 3" p /2. Using this, substituting 
z = 1/2 and that | > Mp p /2 with probability at least 1 — exp(— Mp p /8) we now have: 

P(max(x,ei) < p/2) < exp(-Mp p 3~ p /4) + exp(-Mp p /8) 
We may now union bound over T using rotational invariance to obtain: 

P(min max(x, 9) < p/2) < (1 + 2ij- 1 ) p (cx P (-M / 9 p 3- ?, /4) + exp(-Mf?/8)) 

Using p = 1/2, v = 1/2, M = 8 P and that f(6) is Lipschitz, we then obtain: 

P( min max(x,6») < 1/4) < 5 p [exp(-4 p - 1 /3 P ) + exp(-4 p /8)l 
||0||=i x' v 

< exp(-p), 

when p > 20. 

A. 3. 2 Proof of condition 2 

Fix radii p and 5 such that p + 8 < 1. We choose the to be Ap n Ball (p) . Consider a point x such that 
||x|| < 1 — 8. We consider the events E^D^. 

Ei e {I a distribution V Xi satisfying Assumption [l] 2} 
D t = {x 4 e Ball(p)} 

We now bound P(£^|Dj). Within a distance S around Xj, there will be, in expectation, M£ p arms (assuming 
the total number of points is M + 1). Indeed the distribution of the number of arms within distance 6 around 
Xi is binomial with mean M8 P and variance MS P (1 — 5 P ). 

Conditional on the number of arms in Ball(<5, Xj) being n, these arms are uniformly distributed in Ba 1 1 (<5, Xj) 
and are independent of Xj. We will use P„ to denote this conditional probability measure. Denote the 
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arms within distance S from Xi to be v\,V2- ■ -v n . Define = (Sj=i Uj)/n for all j and 

Q = i u j u ~j)/ n - To construct the probability distribution P^., we let the weight Wj on the arm Vj to 
be: 

1 / 1 - ujQ-ijA 
Wj ~ n \l-u T Q- l u ) 

It is easy to check that these weights yield the correct first moment, i.e. X)j=i w j v j ~ x i- Before considering 
the second moment, we first show that Q concentrates around its mean. It is straightforward to compute 
that E(Q) = E(uiu[) = pl p , where fx = S 2 /{p + 2). By the matrix Chernoff bound [AW021 ITrol2) . there 
exist c > such that: 

PndlQ^II > -) < pcM-^/5 2 ), (18) 
A* 

where ||Q|| denotes the operator norm and the probability is over the distribution of the Uj. We further 
have, for all j: 



1 . 1 



wj > - (1 - ll^-IIIIQ-^ll) > - (1 - tfHQ-^lllfill) • (19) 
7% n 

Also, using Theorem 2.1 of [JNQ8] we obtain that: 

P„(INI > S/n 1 /*) < exp |- (nl/4 ~ 1)2 | 
< exp(-n 1/2 /4), 



for n > 16. Combining this with Eq. (18) and continuing inequalities in Eq. (19), we obtain, for all j,: 

i / 2( P + 2; 



with probability at least 1 — u)(n,p) where uj(n,p) — pexp(—cn/2p) + exp(— n 1 / 2 ). We can now bound the 
second moment of P x : 



3=1 3=1 
n 

^ X W J U 3 M J 

3=1 

"^2 ni/4jp + 2 p 

where the last inequality holds with probability at least 1 — uj(n,p). Thus we can obtain 7 = (5 2 /8 for 
n > [4(p + 2)] 4 . 

In addition, a standard Chernoff bound argument yields that the number of arms in Ball(<5, Xi) is at least 
MS p /2 with probability at least 1 - exp(-M£ p /8). With this, we can bound F(Ei\Di): 

P(£;|A) < exp{-M5 p /8) + oj(M5 p /2, P ). 

The event F that the uniform cloud does not satisfy Assumption [T]2 can now be decomposed as follows: 
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(M 
{J(Ei n A) 
i=i 

M+l 

< ^ P(^|A)P(A) 

< 2Mp p {exp(-M(5 p /8) + w(Af<5 p /8,p)} 

Choosing 5 = p = 1/2, with M = 8 P , we get that the uniform cloud satisfies Assumption [I] 2 with 
7 > S' 2 /8 = 1/32 with probability at least 1 - 2 • 4P[exp(-4P/8) + oj(4P/8,p)] > 1 - exp(-p) when p > 10. 
Summarizing the proofs of both conditions we have, choosing the number of points M — 8 P , the subset 
X' p = X p n Ball (1 /2) , we obtain constants k — 1/4 and 7 = 1/32 with probability at least 1 — 2exp(— p), 
provided p > 20. 
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